Developing a shortened spine functional index (SFI-10) for patients with sub-acute/chronic spinal disorders: a cross-sectional study

Background Brief whole-spine patient-reported outcome measures (PROMs) provide regional solutions and future directions for quantifying functional status, evidence, and effective interventions. The whole-spine regional Spine Functional Index (SFI-25) is used internationally in clinical and scientific contexts to assess general sub-acute/chronic spine populations. However, to improve structural validity and practicality a shortened version is recommended. This study developed a shortened-SFI from the determined optimal number of item questions that: correlated with criteria PROMs being highly with whole-spine, moderately with regional-spine, condition-specific and patient-specific, and moderately-low with general-health and pain; retained one-dimensional structural validity and high internal consistency; and improved practicality to reduce administrative burden. Methods A cross-sectional study (n = 505, age = 18-87 yrs., average = 40.3 ± 10.1 yrs) of sub-acute/chronic spine physiotherapy outpatients from an international sample of convenience. Three shortened versions of the original SFI-25 were developed using 1) qualitative ‘content-retention’ methodology, 2) quantitative ‘factorial’ methodology, and 3) quantitative ‘Rasch’ methodology, with a fourth ‘random’ version produced as a comparative control. The clinimetric properties were established for structural validity with exploratory (EFA) and confirmatory (CFA) factorial analysis, and Rasch analysis. Criterion validity used the: whole-spine SFI-25 and Functional Rating Index (FRI); regional-spine Neck Disability Index (NDI), Oswestry Disability Index (ODI), and Roland Morris Questionnaire (RMQ), condition-specific Whiplash Disability Questionnaire (WDQ); and patient-specific functional scale (PSFS); and determined floor/ceiling effect. A post-hoc pooled international sub-acute/chronic spine sample (n = 1433, age = 18-91 yrs., average = 42.0 ± 15.7 yrs) clarified the findings and employed the general-health EuroQuol-Index (EQ-5D), and 11-point Pain Numerical Rating Scale (P-NRS) criteria. Results A 10-item SFI retained structural validity with optimal practicality requiring no computational aid. The SFI-10 concept-retention-version demonstrated preferred criterion validity with whole-spine criteria (SFI-25 = 0.967, FRI = 0.810) and exceeded cut-off minimums with regional-spine, condition-specific, and patient-specific measures. An unequivocal one-dimensional structure was determined. Internal consistency was satisfactory (α = 0.80) with no floor/ceiling effect. Post-hoc analysis of the international sample confirmed these findings. Conclusion The SFI-10 qualitative concept-retention version was preferred to quantitative factorial and Rasch versions, demonstrated structural and criterion validity, and preferred correlation with criteria measures. Further longitudinal research is required for reliability, error, and responsiveness, plus an examination of the practical characteristics of readability and administrative burden.

longitudinal research is required for reliability, error, and responsiveness, plus an examination of the practical characteristics of readability and administrative burden.Keywords Spine, Musculoskeletal, Assessment, Patient-reported outcome measure, Functional limitation, Clinometric Background Functional status measurement is frequently determined with patient-reported outcome measures (PROMs) as they provide optimal practicality, statistical coherence, and structural-validity [1].For patients with spine disorders, there has been a progressive shift toward 'whole-spine' PROMs that measure status as a continuous functional kinetic-chain [2].These have included static-PROMs, the Extended Aberdeen Spine Pain Scale (EASPS) [3], Functional Rating Index (FRI) [4], Spine Functional Index (SFI-25) [5], and the Computer Adaptive Testing (CAT) assessed Patient-Reported Outcomes Measurement Information System (PROMIS) for Physical Function (PROMIS-PF) [6,7].This whole-spine approach has high clinical relevance as a single, practical, psychometrically accurate, whole-spine PROM provides clinicians, researchers, and patients with a reduced administrative burden as multiple PROMs are no longer required for different regions and conditions [3,8,9].This directly reduces the key barriers to PROM adoption [10,11], complies with why a PROM is chosen and used under the essential nine pragmatic requirements [12,13], and provides the capacity for a consistent spine single-score, broadened data-pooling, meta-analysis [14], and the capacity to demonstrate whether specific healthcare delivery is effective or not [15].
To balance the psychometrics, practicality, and cultural transferability, any whole-spine PROM must comply with the 'Consensus-based Standards for the selection of Health status Measurement Instruments' (COSMIN) standards [16].The SFI-25 does this, being stringently developed and initially conference presented in 2004, with E-publication in 2013, with publication delays due to Journal submission processes and PhD by Portfolio requirements, with the official publication in 2019 affected by similar Journal-related delays [5].This eventual peer validation permitted the inclusion of the SFI-25 within a whole-spine static-PROMs systematic review that considered the FRI and EASPS, where both had recognized concerns [8], but consequently did not include the PROMIS-PF.The FRI critiques were that it be used with caution till more robust high methodological quality studies are found to support its measurement properties [17], that it has itemconstruct deficiencies [18], and questionable ability to adequately represent whole-spine problems [8].The EASPS, with 28-35 questions over four pages, is recognized as cumbersome with questionable COSMIN compliance [8].
The SFI-25 has had seven published validation studies [19][20][21][22][23][24][25], with a further comparative validation study under submission [26] and was most recently used in a chronic neck pain study [27].These cultural-adaptation studies not only adapted and validated the SFI-25 for their specific linguistic and population requirements, but also performed criterion validity with multiple wholespine, spine-region, general, and condition-specific populations.In each case, the SFI-25 was found preferable to the criteria PROMs that included the Neck Disability Index (NDI) [28], Oswestry Disability Index (ODI) [29], Roland Morris Questionnaire (RMQ) [30], and Whiplash Disability Questionnaire (WDQ) [31].Additionally, suitable correlation was demonstrated with the patient-specific function scale (PSFS) [5] and EuroQuol-Index (EQ-5D) [19,20], but less so with an 11-point pain numerical rating scale (P-NRS) [19] and the SF-36 PF scale [26].However, the SFI-25's structural validity was not unanimous with a shortened version recommended in most studies to improve practicality and structural validity.
The PROMIS-PF, using 'CAT' in varied spine-specific populations [32,33], captures similar information to static-legacy PROMs [34,35] but with greater efficacy and accuracy [6,7,36].However, many populations lack the computing and internet accessibility necessary for PROMIS-PF, which, coupled with patient settings and computer literacy, must be considered [37].Additionally, though content validity is sufficient, evidence quality in adult populations is low-moderate, particularly for single body areas and conditions, and elderly minority populations [38].Further, minimal spine studies incorporated PROMIS-PF for its outcome measurement use, with substantial variability in domain validity between PROMIS-PF and criteria static-PROM [39].Consequently, there remains a place and need for a simple-to-use, accurate, and practical whole-spine static-PROM with low administrative burden [12,40].
The advocated methodologies to shorten PROMs are two-fold, qualitative and quantitative.Qualitative approaches use expert committee consensus with the 'concept-retention' method advocated for being judgmental and retaining the original PROMs theoretical domains [41].Quantitative approaches use statistical methods, with 'factorial' and 'Rasch' the most common [1,41].This study aimed to: 1) develop a shortened-SFI for assessing spine functional status; 2) determine the correlation between the shortened-SFI and whole-spine criteria; 3) assess the correlation between the shortened-SFI and regional-spine, condition-specific, and patientspecific; criteria 4) investigate the correlation between the shortened-SFI and general-health and pain criteria; 5) ensure that the shortened-SFI retains the psychometric characteristics of one-dimensional structural validity, high internal consistency, and no floor/ceiling effect; and 6) enhance the practicality of the shortened-SFI to reduce administrative burden.
Accordingly, we hypothesized that: 1) the developed shortened-SFI will exhibit a high correlation with wholespine criteria; 2) the correlation between the shortened-SFI and regional-spine, condition-specific, and patient-specific criteria will be moderate; 3) the correlation between the shortened-SFI and general-health and pain criteria will be moderate to low; 4) the psychometric properties of the shortened-SFI, including one-dimensional structural validity, high internal consistency, and absence of floor/ceiling effects, will be retained; and 5) practical enhancements made to the shortened-SFI will result in a reduction of administrative burden.

Study design
This cross-sectional study (n = 505) was conducted to shorten the SFI-25 to the SFI-10.All subjects provided written informed consent with the study approved by the Ethical Committee of the Universidade Federal do Maranhão (approval protocol number 4.284.203).
The post-hoc international sample (n = 1433, age = 18-91 yrs., av.= 40.3± 10.1 yrs., female = 58.4%,Table 1) included retrospective de-identified data obtained with permission from the original researchers of three additional published SFI-25 cross-cultural adaptation studies [19,22,23] and a further data set from a completed MSc research study [26] that has progressed to journal submission.

Functional rating index
The FRI has 10 item-questions with five short-descriptive response options (0-4 Likert visual NRS).'Raw Score' (0-40) totals from the summation of all item responses.The final score, (0-100%: (0% = 'no problem/pain'; 100% = 'worst possible') is calculated by: [Raw Score × 2.5] with one response permitted for substitution [4].Each of the other spine-regional and general criteria PROMs are described in their original respective publications.

Development and psychometric assessment of the SFI-10
'Development' the shortened version of the SFI-25 was done through a-priori determination of the minimum number of item-questions necessary to retain structural validity and optimal practicality without a computational aid.The minimum number was guided by Spearman-Brown's 'k value' [43,44], the optimal number by completion/scoring-time, accuracy, and no computational aid being required [12,45].Additionally, one-dimensional structural integrity was required along with face, content and criterion validity (Pearson's or Spearman's r), plus internal consistency (Cronbach's α:scale-level > 0.75; item-level > 0.65) [46,47].
Four methodological approaches obtained the required optimal number of item-questions.
The sociodemographic data and questionnaire scores used mean (x) and standard deviation (SD) in SPSS version 17 with significance:p < 0.05.The Kolmogorov-Smirnov test verified data-distribution.Factorial/Rasch analyses were blinded to minimize bias.

Results
The 'Development' indicated the minimum number of item-questions was n = 8 (Spearman-Brown k = 3.33).The optimal number of item-questions was n = 10, (from options of SFI-8, 10, 12 and 15 items), as this required no computational aid and retained the biopsychosocial 60:40 item-question ratio with six 'General' (#1-6) and four 'Region-specific' (#7-10) items (Fig. 1).The item-reduction and selection process confirmed face and content validity.The SFI-10 'Raw Score' (0-10) is totaled from the summation of all item responses with the final score from: [100-(Raw Score × 10)], with one missing response and substitution permitted.

Discussion
The study's essential aims were achieved with a shortened SFI-10 developed.Face and concept validity were demonstrated by the reduction process with the criterion and structural validity confirmed by the psychometric analysis.The SFI-10 correlated highly with whole-spine criteria PROMs, moderately with region-specific, patientspecific, and condition-specific, and moderate-low for general-health and pain.Practicality was improved by 60%, though completion/scoring time/errors require quantification.The SFI-10 qualitative 'concept-retention' version demonstrated higher criterion validity with whole-spine criteria than the quantitative 'factorial' and  'Rasch' versions, where both interestingly showed lower PCC values than the control/random (Table 2).Criterion validity was comparable with the FRI and slightly below the SFI-25 in the same sample and the original Australian SFI-25 study [5], but exceeded the Turkish [23], Korean [21] and Chinese [24] findings (Table 3).
Structural validity was unequivocally one-dimensional, being supported by factorial and Rasch analysis in the full n = 505 sample and the post-hoc international sample (n = 1433).This complied with previous research recommendations that factor structure be improved as, although a dominant single-factor was present, 6-8 factors were demonstrated [5,20,22,23].Spine-regional and patient-specific criteria correlations approximated the SFI-25 findings, but the RMQ and NDI were notably lower (Table 3).However, SFI-10 spine-regional and  general-health criteria exceeded those of six SFI-25 studies [20][21][22][23][24][25] (Table 3).Importantly, the SFI-10 retained the biopsychosocial 60:40 ratio conceptual model of general-versusregional items [5,42], which could not be maintained in the SFI-8, 12, and 15 item versions, each of which also required a computational aid.This biopsychosocial balance reduces risks of confounding 'functional' and 'symptomatic' change [56] while accommodating pain without potentially affecting responsiveness [57].The increased SFI-10 practicality improved the scoring process without the need for a computational aid through a simple calculation of '× 10' converting raw-scores to percentages [13,45].This should ensure lower administrative burden through reduced completion/scoring times [19,40] and minimal potential errors [13], while complying with the essential nine pragmatic decisions for choosing and using a PROM [12].In general, the popularity of short scales is explained by their need for reduced resources, particularly administrative burden and subsequent related costs [10,40].These findings reflect the two essential reasons for PROM shortening, practicality improvements and retaining validity and factor structure [16], as face, content, criterion, and structural validity must be retained [1,46].
The preferred 'concept-retention' methodology supports similar PROM-shortening research where qualitative versions were superior to quantitative.This was demonstrated for the Quick-DASH (11-items) from the DASH (30-item) [41], though factor structure was not one-dimensional and practicality remained impaired as computational assistance was required.Similarly, concept-retention methodology produced the 10-item lower limb functional index (LLFI-10) from the LLFI-25 as a practical solution with one-dimensional validation in burns [58].The 12-item Orebro Musculoskeletal Screening Questionnaire (OMSQ-12) improved the practicality of the original 21-item OMPainSQ and retained the critical psychometric characteristics for biopsychosocial risk screening [59,60].This contrasts with a qualitative 'author-determined' OMPainSQ-10 approach [61], where criterion validity was below the random version, as found in this study, and notably below the 'concept-retention' version [59].The shortened NDI-5 combined qualitative and quantitative approaches, retained a one-dimensional structure [1,56], and balanced psychometric and practical characteristics when compared to the 10-item version, the quantitative NDI-8 Rasch-version [57], and the NDI-7 factorial-version [1].Various qualitative processes reduced the RMQ from 24 to 18 and 11 items [62], with the former, found preferable [62,63].However, no RMQ qualitative shortened version is available, and a computational aid remains necessary for all for practicality in calculating the scores of all RMQ versions.However, the question remains as to what is 'the optimal minimum number' of item-questions that provides a sufficiently broad representation of the required domains [64], and can this be represented by only five items as per the NDI-5 [1,56].
This study demonstrated and reinforced that a qualitative approach does produce a shortened-PROM that has balanced the requirements for critical psychometric characteristics and one-dimensional structural validity while concurrently improving practicality.Very short scales, below 10-items, increase the measurement error from lower precision [64], hence the SFI-10 version appears an appropriate solution.Consequently, this concept-retention qualitative item-reduction process can be confidently applied to similar regional PROMs to facilitate their application in clinical and research settings.

Study limitations and strengths
Study limitations include potential patient selection bias as recruitment was from primary contact and referred physiotherapy outpatients, consequently inpatient and community settings will need to be investigated.There is a lack of prospective data and repeated psychometric and practicality analysis.This leaves a knowledge gap in the test-retest reliability, responsiveness, and error scores, including both minimal detectable change and minimal clinically significant difference.Consequently, there is a need for longitudinal analysis, that includes patientspecific change, to clarify these psychometric properties.Further, the practical aspects of readability, missing responses, and administrative burden from completion and scoring times/errors must be quantified.Each of these latter limitations are now addressed in a subsequent study.
Study strengths included the large sample size and the clarification of findings in a further pooled international sample.Additionally, the SFI-10 development exceeded the minimal COSMIN standards and cut-off requirements.This incorporated the cross-sectional analysis and the pooled international sample from diverse populations with broad diagnoses.

Conclusions
This study developed a shortened 10-item SFI-10 whole-spine PROM and verified structural validity through factorial and Rasch analysis, criterion validity and internal consistency with no floor/ceiling effects.The pooled MSD population of diverse age, culture, and clinical settings supported potential generalizability for outpatient settings, but inpatient and community settings require investigation.The improved practicality and unequivocal one-dimensional factor structure provided a summated score that is easily and rapidly determined without a computational aid.These attributes imply that the SFI-10 can be used in preference to the existing whole-spine and spine-regional PROMs in clinical and research settings.Further longitudinal research is currently underway to determine the critical psychometric characteristics of test-retest reliability, responsiveness, and error scores; and to quantify the practical characteristics of readability and administrative burden that include completion and scoring time/ errors.Subsequently, a systematic review that includes the SFI-10 and published SFI-25 studies would further inform and clarify the clinimetric properties.

Table 1
Demographics for all study participants SFI indicates 25-item Spine Functional Index, n number, x̄ mean, SD standard deviation, % percent, Cx cervical, Tx thoracic, Lx lumbar * Subregion % values include multi-area individuals within each of their symptomatic regions making the total > 100%

Table 2
Criterion validity comparing SFI-10 versions with the SFI-25 and FRI SFI indicates Spine Functional Index, PCC Pearson's Correlation Coefficient for normally distributed data, FRI Functional Rating Index *PCC/SCC: r > 0.95 with the SFI-25; and **r > 0.70 with the FRI as the indicator of potential suitability to substitute for the SFI-25 # PCC/SCC Highest value was the preferred version

Table 3
Criterion validity for the SFI-25 and SFI-10 from existing published research SFI indicates Spine Functional Index, PCC Pearson's Correlation Coefficient for normally distributed data, SCC Spearman's Correlation Coefficient for non-normally distributed data, FRI Functional Rating Index, ODI Oswestry Disability Index, RMDQ Roland Morris Disability Questionnaire, NDI Neck Disability Index, WDQ Whiplash Disability Questionnaire, PSFS Patient Specific Index, EQ-Index EuroQol Index, P-NRS Pain Numerical Rating Scale

Table 4
Structural validity determination from factorial (CFA) analysis df indicates degrees of freedom, CFI comparative fit index, TLI Tucker-Lewis index, RMSEA root means square error of approximation, CI confidence interval

Table 5
Rasch analysis of the SFI-10 (n = 505 and n = 1433 are similar)